Introduction to Phylogenetic Comparative Methods

Dean Adams, Iowa State University

26 May, 2020

Phenotypic Diversity

Morphometricians wish to explain the evolution of phenotypic diversity

How do we characterize patterns, and hypothesize processes in an evolutionary context?

Macroevolution of Phenotypic Diversity

Understanding morphological trends requires:

1: Methods for quantifying morphology (morphometrics)

2: Statistical models & tools for testing hypotheses in a phylogenetic framework

Quantifying Phenotypes: The Procrustes Paradigm

GMM: A mathematical merger of statistical shape theory & analytics for extracting and analyzing shape from landmark data (via GPA)

Procedures generate a set of shape variables for subsequent analyses*

* Note: considerable statistical theory was required to derive appropriate analytics for such data

Quantifying Phenotypes: The Procrustes Paradigm

GMM: A mathematical merger of statistical shape theory & analytics for extracting and analyzing shape from landmark data (via GPA)

Procedures generate a set of shape variables for subsequent analyses*

* Note: considerable statistical theory was required to derive appropriate analytics for such data

How can we leverage GMM to address macroevolutionary hypotheses?

Comparative Biology Tradition

Trait correlations often used to study coevolution and adaptation

The problem? Species are not independent of one another

One must account for this via phylogenetic comparative methods

Why is There a Problem?

Let’s simulate phenotypic traits on a phylogeny under Brownian motion:

Why is There a Problem?

Closely related species tend to resemble one another (i.e., \(\small{E(\Sigma_{T1,T2})>0}\))

A Worst Case Scenario

Traits appear correlated, though simulated independently

A Worst Case Scenario

Traits appear correlated, though simulated independently

Conclusion: We must account for phylogeny during comparative analyses

Phylogenetic Comparative Methods (PCMs)

PCMs condition the data on the phylogeny during the analysis

Empirical Goal: Evaluate evolutionary hypotheses while accounting for (phylogenetic) non-independence

Requires an evolutionary model of how trait variation is expected to accumulate

PCMs: Brownian Motion

A useful null model of trait change \(\small\Delta\mu=0\) but \(\small\sigma^2\uparrow\propto{time}\)

Side-note: this is the continuous-trait model equivalent of the Markov process, and is intimately related to Gaussian theory and the normal distribution

see Felsenstein (1973; 1985)

PCMs: An Incomplete Historical Walk

The conceptual development of PCMs

Phylogenetic Comparative Method Toolkit

The methods of PCM: Analytical tools to address particular evolutionary hypotheses

Today we will focus on Phylogenetic Linear Models (regression/ANOVA)

PCM: General Model

Most PCMs use GLS (generalized least squares) as a model:

\(\small\mathbf{V}\) is the phylogenetic covariance matrix: it describes the amount of evolutionary time species share via common ancestry (and thus how similar they are expected to be)

Sometimes V is called C (particularly when derived under Brownian motion)

Motivating a Solution

One analytical solution can be appreciated by the following:

Motivating a Solution

One analytical solution can be appreciated by the following:

From the circles above, one can envision a set of contrasts between sister taxa that are independent given the phylogeny (see Felsenstein 1985)

Phylogenetically Independent Contrasts

Estimate contrast scores between pairs of taxa (tips or nodes)

Use contrasts for analyses (OLS solution)

Coefficients found as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{X}^{T}_{pic} \mathbf{X}_{pic}\right )^{-1}\left ( \mathbf{X}^{T}_{pic} \mathbf{Y}_{pic}\right )\)

see Felsenstein. Am. Nat. (1985)

Phylogenetically Independent Contrasts: Example

What is the association between Y and X?

## Analysis of Variance Table
## 
## Response: pic.y
##           Df Sum Sq Mean Sq F value Pr(>F)
## pic.x      1  14.35   14.35    19.3 0.0046
## Residuals  6   4.47    0.74
## pic.x 
## 0.885

Phylogenetic Generalized Least Squares (PGLS)

Alternative implementation: Use GLS model with non-independent error structure

Statistical model: \(\small\mathbf{Y}=\mathbf{X}\mathbf{\beta } +\mathbf{E}\) where \(\small\mathbf{E} \sim \mathcal{N}(0,\textbf{V})\)

\(\textbf{V}\) is the phylogenetic covariance matrix

Coefficients found as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{X}^{T} \mathbf{V}^{-1} \mathbf{X}\right )^{-1}\left ( \mathbf{X}^{T} \mathbf{V}^{-1}\mathbf{Y}\right )\)

*Note: results identical to PIC (if implemented correctly)

see Grafen. Phil. Trans. Roy. Soc. (1989)

PGLS: Example

## Denom. DF: 6 
##             numDF F-value p-value
## (Intercept)     1    12.9  0.0115
## X               1    19.3  0.0046
## (Intercept)           X 
##      -2.330       0.885

Identical results to PICs!

Phylogenetic Transformation

Alternative implementation: Utilize OLS transformation of GLS models

Phylogenetic transformation matrix: \(\small\mathbf{T}=\left ( \mathbf{U}\mathbf{W}^{-1/2} \mathbf{U}^{T}\right )^{-1}\)

U and W are eigenvectors and eigenvalues of V

Transformed data: \(\small\tilde{\mathbf{Y}}=\mathbf{TY}\)

Transformed design matrix: \(\small\tilde{\mathbf{X}}=\mathbf{TX}\)

Coefficients solved as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}}\right )\)

*Note: results identical to PIC & PGLS (if implemented correctly)

see Garland and Ives. Am. Nat. (2000); Adams. Evol. (2014); Adams and Collyer. Evol. (2018)

Phylogenetic Transformation: Example

##           Df    SS    MS   Rsq    F    Z Pr(>F)
## X          1 14.35 14.35 0.763 19.3 1.82  0.002
## Residuals  6  4.47  0.74 0.237                 
## Total      7 18.82
## (Intercept)           X 
##      -2.330       0.885

Identical results to PICs & PGLS!

Assessing Significance

PIC, PGLS, and Phylogenetic transform yield identical parameter estimates

\[\tiny \hat{\mathbf{\beta }}=\left ( \mathbf{X}^{T} \mathbf{V}^{-1} \mathbf{X}\right )^{-1}\left ( \mathbf{X}^{T} \mathbf{V}^{-1}\mathbf{Y}\right )\]

\[\tiny =\left ( \mathbf{X}^{T}_{pic} \mathbf{X}_{pic}\right )^{-1}\left ( \mathbf{X}^{T}_{pic} \mathbf{Y}_{pic}\right )\]

\[\tiny =\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}}\right )\]

\[\tiny \log{L}=\log{ \left(\frac{exp{\left({-\frac{1}{2}{\left({\mathbf{Y}-E(\mathbf{Y})}\right)^{T}}} \mathbf{V}^{-1}\left({\mathbf{Y}-E(\mathbf{Y})}\right)\right)}} {\sqrt{\left({2\pi}\right)^{Np}\times\begin{vmatrix} \mathbf{V} \end{vmatrix}}}\right)}\]

Assessing Significance

PROBLEM: Parametric significance testing suffers from Rao’s paradox

Power reduces as p-dimensions increase

Why does this happen?

Adams. Evol. (2014); Adams and Collyer. Syst. Biol. (2018); Adams and Collyer. Ann. Rev. Ecol. Evol. Syst. (2019)

What’s In a Likelihood?

One can understand this result by appreciating the likelihood equation:

\[\tiny \log{L}=\log{ \left(\frac{exp{\left({-\frac{1}{2}{\left({\mathbf{Y}-E(\mathbf{Y})}\right)^{T}}} \mathbf{V}^{-1}\left({\mathbf{Y}-E(\mathbf{Y})}\right)\right)}} {\sqrt{\left({2\pi}\right)^{Np}\times\begin{vmatrix} \mathbf{V} \end{vmatrix}}}\right)}\]

We see there are three components to \(\small\log{L}\):

Element Component Meaning
1 \(\tiny\frac{Np}{2}log{2\pi}\) A constant
2 \(\tiny\frac{1}{2}{\left(\mathbf{Y}-E(\mathbf{Y})\right)^{T}\mathbf{V}^{-1}\left(\mathbf{Y}-E(\mathbf{Y})\right)}\) Summed-squared deviations from the predicted values in some fashion
3 \(\tiny\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\) The error covariance of the model

Surely #2 & #3 vary across different models? … WRONG!!

Likelihood: Component Parts

Let’s look at: \(\tiny\frac{1}{2}{\left(\mathbf{Y}-E(\mathbf{Y})\right)^{T}\mathbf{V}^{-1}\left(\mathbf{Y}-E(\mathbf{Y})\right)}\) (univariate OLS example for simplicity):

\(\tiny\tiny\mathbf{Y}=\begin{bmatrix} 6\\ 7\\ 4\\ 5\\ 9\\ 3\\ 2\\ 5 \end{bmatrix}\) \(\tiny\tiny\mathbf{X_{0}}=\begin{bmatrix} 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1 \end{bmatrix}\) \(\tiny\tiny\mathbf{X_{1}}=\begin{bmatrix} 1 & 1\\ 1& 3\\ 1& 3\\ 1& 4\\ 1& 5\\ 1& 6\\ 1& 7\\ 1& 8 \end{bmatrix}\)

beta<-solve(t(x)%*%x)%*%t(x)%*%y
sigma<-(t(y-x%*%beta)%*%(y-x%*%beta) ) / N
(t(y-x%*%beta)%*%(y-x%*%beta)) / (2*sigma)
##      [,1]
## [1,]    4
beta1<-solve(t(x)%*%x)%*%t(x)%*%y
sigma1<-(t(y-x%*%beta1)%*%(y-x%*%beta1) ) / N
(t(y-x%*%beta1)%*%(y-x%*%beta1)) / (2*sigma1)
##      [,1]
## [1,]    4

Wait! What? For two different models this component is IDENTICAL!

ALGEBRA, please provide us with wisdom…

Likelihood: Component Parts

And algebra answers…“Yes I will!”

Say \(\small\mathbf{X}\) is a vector of ones and \(\small\mathbf{Y}\) is univariate. Then for OLS models \(\small\mathbf{X\beta}\) is a vector of mean \(\small\mu{Y}\)

Thus, \(\small{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}\) is \(\small\mathbf{SS}_{\epsilon}\)

Likelihood: Component Parts

And algebra answers…“Yes I will!”

Say \(\small\mathbf{X}\) is a vector of ones and \(\small\mathbf{Y}\) is univariate. Then for OLS models \(\small\mathbf{X\beta}\) is a vector of mean \(\small\mu{Y}\)

Thus, \(\small{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}\) is \(\small\mathbf{SS}_{\epsilon}\)

Thus we have: \(\small\frac{1}{2\sigma^{2}}{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}=\frac{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}{2\frac{\mathbf{E}^{T}\mathbf{E}}{n}}\)

Likelihood: Component Parts

And algebra answers…“Yes I will!”

Say \(\small\mathbf{X}\) is a vector of ones and \(\small\mathbf{Y}\) is univariate. Then for OLS models \(\small\mathbf{X\beta}\) is a vector of mean \(\small\mu{Y}\)

Thus, \(\small{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}\) is \(\small\mathbf{SS}_{\epsilon}\)

Thus we have: \(\small\frac{1}{2\sigma^{2}}{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}=\frac{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}{2\frac{\mathbf{E}^{T}\mathbf{E}}{n}}\)

Re-arranging gives us \(\small=\frac{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}{\frac{2}{n}\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}=\frac{n}{2}\)

REGARDLESS of the data in Y \(\small\frac{1}{2\sigma^{2}}{\left(\mathbf{Y}-\mathbf{X\beta}\right)^{T}\left(\mathbf{Y}-\mathbf{X\beta}\right)}\) is the constant: \(\small\frac{n}{2}\)

This result holds multivariate and GLS models: where the general solution is: \(\small\frac{np}{2}\).

Revisiting the Likelihood

Our likelihood thus contains:

\[\tiny \log{L}=\log{ \left(\frac{exp{\left({-\frac{1}{2}{\left({\mathbf{Y}-E(\mathbf{Y})}\right)^{T}}} \mathbf{V}^{-1}\left({\mathbf{Y}-E(\mathbf{Y})}\right)\right)}} {\sqrt{\left({2\pi}\right)^{Np}\times\begin{vmatrix} \mathbf{V} \end{vmatrix}}}\right)}\]

Element Component Meaning
1 \(\tiny\frac{Np}{2}log{2\pi}\) A constant
2 \(\tiny\frac{1}{2}{\left(\mathbf{Y}-E(\mathbf{Y})\right)^{T}\mathbf{V}^{-1}\left(\mathbf{Y}-E(\mathbf{Y})\right)}\) A constant
3 \(\tiny\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\) The error covariance of the model

Thus, for parametric statistical hypothesis testing, all logL estimates (and by consequence derived summary measures such as AIC) are based on ONE component that is free to vary across models, \(\small\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\)!

And this component becomes singular as \(N\rightarrow{p}\)!

Revisiting the Likelihood

Our likelihood thus contains:

\[\tiny \log{L}=\log{ \left(\frac{exp{\left({-\frac{1}{2}{\left({\mathbf{Y}-E(\mathbf{Y})}\right)^{T}}} \mathbf{V}^{-1}\left({\mathbf{Y}-E(\mathbf{Y})}\right)\right)}} {\sqrt{\left({2\pi}\right)^{Np}\times\begin{vmatrix} \mathbf{V} \end{vmatrix}}}\right)}\]

Element Component Meaning
1 \(\tiny\frac{Np}{2}log{2\pi}\) A constant
2 \(\tiny\frac{1}{2}{\left(\mathbf{Y}-E(\mathbf{Y})\right)^{T}\mathbf{V}^{-1}\left(\mathbf{Y}-E(\mathbf{Y})\right)}\) A constant
3 \(\tiny\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\) The error covariance of the model

Thus, for parametric statistical hypothesis testing, all logL estimates (and by consequence derived summary measures such as AIC) are based on ONE component that is free to vary across models, \(\small\frac{1}{2}log\begin{vmatrix} \mathbf{V} \end{vmatrix}\)!

And this component becomes singular as \(N\rightarrow{p}\)!

We need an alternative approach

PCMs for Multivariate Data

Forgo standard ML and parametric approaches for statistical evaluation, and use robust methods

New GMM/PCM Approach:

1: Condition data on phylogeny & fit model parameters

2: Obtain robust summary measures (avoid \(\begin{vmatrix} \mathbf{V} \end{vmatrix} = 0\))

3: Evaluate significance and effect sizes NOT using logL

Condition Data on Phylogeny

We utilize the phylogenetic transformation procedure above

Phylogenetic transformation matrix: \(\small\mathbf{T}=\left ( \mathbf{U}\mathbf{W}^{-1/2} \mathbf{U}^{T}\right )^{-1}\)

U and W are eigenvectors and eigenvalues of V

Transformed data: \(\small\tilde{\mathbf{Y}}=\mathbf{TY}\)

Transformed design matrix: \(\small\tilde{\mathbf{X}}=\mathbf{TX}\)

Coefficients solved as: \(\small\hat{\mathbf{\beta }}=\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{X}}\right )^{-1}\left ( \mathbf{\tilde{X}}^{T} \mathbf{\tilde{Y}}\right )\)

see Adams. Evol. (2014); Adams and Collyer. Evol. (2018)

Robust Summary Measures

Leverage geometry to obtain robust summary statistics

One way: Sums-of-squares from object distances*

Avoids \(\begin{vmatrix} \mathbf{V} \end{vmatrix} = 0\), but still obtains: \(SS\), \(MS\), \(F\), \(R^2\), etc.

*Note: approach also used for Goodall’s F-test

See: Adams (2014a); Adams (2014b); Adams (2014c); Adams and Collyer (2015); (2018a, 2018b, 2019)

Assessing Significance: RRPP

Assessing Significance: RRPP

Assessing Significance: RRPP

RRPP and Power

RRPP Properties

Phylogenetic Regression: Example

Does head shape covary with body size among Plethodon salamander species?

##           Df      SS       MS  Rsq    F    Z Pr(>Cohen's f-squared)
## svl        1 0.00066 0.000659 0.07 3.03 2.02                  0.012
## Residuals 40 0.00870 0.000217 0.93                                 
## Total     41 0.00936

YES! There is a significant association between head shape and body size in salamanders

Phylogenetic Linear Models & GM-Data

Phylogenetic transformation + RRPP provides analytical generalization of PCMs for high-dimensional data

All manner of linear models can be envisioned: phylogenetic regression, ANOVA, factorial models, etc.

Provides analytical tool for evaluating macroevolutionary trends of shape change in a phylogenetic context

Sets the stage for the development of other PCMMs (perhaps the subject of future discussions)